Vertex Degree Distribution for the Graph of Word Co-Occurrences in Russian

نویسندگان

  • Victor Kapustin
  • Anna Jamsen
چکیده

Degree distributions for word forms cooccurrences for large Russian text collections are obtained. Two power laws fit the distributions pretty good, thus supporting Dorogovtsev-Mendes model for Russian. Few different Russian text collections were studied, and statistical errors are shown to be negligible. The model exponents for Russian are found to differ from those for English, the difference probably being due to the difference in the collections structure. On the contrary, the estimated size of the supposed kernel lexicon appeared to be almost the same for the both languages, thus supporting the idea of importance of word forms for a perceptual lexicon of a human.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Some Results on Forgotten Topological Coindex

The forgotten topological coindex (also called Lanzhou index) is defined for a simple connected graph G as the sum of the terms du2+dv2 over all non-adjacent vertex pairs uv of G, where du denotes the degree of the vertex u in G. In this paper, we present some inequalit...

متن کامل

On discriminativity of vertex-degree-based indices

A recently published paper [T. Došlić, this journal 3 (2012) 25-34] considers the Zagreb indices of benzenoid systems, and points out their low discriminativity. We show that analogous results hold for a variety of vertex-degree-based molecular structure descriptors that are being studied in contemporary mathematical chemistry. We also show that these results are straightforwardly obtained by u...

متن کامل

Splice Graphs and their Vertex-Degree-Based Invariants

Let G_1 and G_2 be simple connected graphs with disjoint vertex sets V(G_1) and V(G_2), respectively. For given vertices a_1in V(G_1) and a_2in V(G_2), a splice of G_1 and G_2 by vertices a_1 and a_2 is defined by identifying the vertices a_1 and a_2 in the union of G_1 and G_2. In this paper, we present exact formulas for computing some vertex-degree-based graph invariants of splice of graphs.

متن کامل

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007